use weights_only in conversion script to prevent model arbitrary code execution#32
Merged
ggerganov merged 1 commit intoggml-org:masterfrom Mar 12, 2023
Merged
Conversation
this restricts malicious weights from executing arbitrary code by restricting the unpickler to only loading tensors, primitive types, and dictionaries
Contributor
|
@deepdiffuser I do support this change, but now I get Any ideas? |
Contributor
Author
|
what version of pytorch? I believe you need 1.13.1 for this arg |
Contributor
|
Ah, I see. conda gets you 1.12.1. Let's keep this thread for posterity. |
flowgrad
pushed a commit
to flowgrad/llama.cpp
that referenced
this pull request
Jun 27, 2023
Cuda performance broadcast
Alcpz
added a commit
to Alcpz/llama.cpp
that referenced
this pull request
Dec 1, 2025
Cherry pick of ggml-org#32 Co-authored-by: Yuri Khrustalev <[email protected]>
SamuelOliveirads
pushed a commit
to SamuelOliveirads/llama.cpp
that referenced
this pull request
Dec 29, 2025
* Zen4 flash attention: moving useful parts from the kq_fused_softmax branch * Add flash attention with soft-cap and fix D = 256 case * Flash attention refinements * Update FlashAttn comment --------- Co-authored-by: Iwan Kawrakow <[email protected]>
TheTom
added a commit
to TheTom/llama-cpp-turboquant
that referenced
this pull request
Mar 26, 2026
…n data Part of ggml-org#32: turbo3 prefill degrades relative to q8_0 with context length. Changes so far: - Skip ggml_cont when tensors already contiguous (+1%, minimal) - Generated 32x32 rotation matrices (turbo-rotation-data-32.h) for reduced group size approach (16x less matmul compute) - Fixed V un-rotation to check v->type not k->type Next: update QK_TURBO3_GROUP, Metal WHT kernel, and KV cache for d=32. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Co-Authored-By: [email protected]
TheTom
added a commit
to TheTom/llama-cpp-turboquant
that referenced
this pull request
Mar 26, 2026
…n data Part of ggml-org#32: turbo3 prefill degrades relative to q8_0 with context length. Changes so far: - Skip ggml_cont when tensors already contiguous (+1%, minimal) - Generated 32x32 rotation matrices (turbo-rotation-data-32.h) for reduced group size approach (16x less matmul compute) - Fixed V un-rotation to check v->type not k->type Next: update QK_TURBO3_GROUP, Metal WHT kernel, and KV cache for d=32. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Co-Authored-By: [email protected]
didlawowo
pushed a commit
to didlawowo/llama.cpp
that referenced
this pull request
Mar 27, 2026
…n data Part of ggml-org#32: turbo3 prefill degrades relative to q8_0 with context length. Changes so far: - Skip ggml_cont when tensors already contiguous (+1%, minimal) - Generated 32x32 rotation matrices (turbo-rotation-data-32.h) for reduced group size approach (16x less matmul compute) - Fixed V un-rotation to check v->type not k->type Next: update QK_TURBO3_GROUP, Metal WHT kernel, and KV cache for d=32. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Co-Authored-By: [email protected]
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
this restricts malicious weights from executing arbitrary code by restricting the unpickler to only loading tensors, primitive types, and dictionaries.
see torch.load docs
https://pytorch.org/docs/stable/generated/torch.load.html
i tested this and it seems to work the same as before